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ABSTRACT 

We present in this study the architectural specification and 
feasibility determination for a real-time contour surface display 
generator. We begin by examining a recently reported, highly 
decomposable algorithm for contour surface display generation. 
We establish a piece of the total algorithm as the algorithm com- 
ponent. The algorithm component is that part of the algorithm 
that can be executed in parallel, independently from the computa- 
tions performed on any other algorithm subpart. We propose an 
architecture for the algorithm component, and model that archi- 
tecture in order to determine the real-time capability of the algo- 
rithm. We then model the larger system of multiple algorithm 
component processors. This modeling effort is performed v/ith 
respect to a particular application requiring real-time contour sur- 
face display generation. A VLSI feasibility computation is then per- 
formed on the proposed architecture. The study ends with a look 
at the impact of real-time contour surface display generation on 
the design of the graphics display system. 

Categories and Subject Descriptors: 1.3.1 [ Hard?/are Architecture 
]: architectures, parallel processing, VLSI implementations; 1.3.2 [ 
Graphics Systems ]: multiprocessing systems; 1.3.3 L 

Picture/Image Generation ]: surface visualization; 1.3.5 [ Computa- 
tional Geometry and Object Modeling ]: data structures, discrete 
planar contours, modeling molecules, surface approximation, sur- 
face generation, surface representation, surfaces, 3D graphics; 
1.3.6 [ Methodology and Techniques ]: contouring, interactive sys- 
tems, parallel processing; 1.3.7 [ Three-Dimensional Graphics and 
Realism ]: line drawings, line generation algorithms, real-time 
graphics, surface plotting, surface visualization, surfaces; I.3.m [ 
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1. Introduction 

Contour surface display generation is one of the most frequently used 
graphics algorithms [Barry, 1979], [Faber, 1979], [Wright, 1979], [Zyda, 1984a], 
[Zyda, 1984b], [Zyda, 1993], [Zyda, 1982], [Zyda, 1981]. A contour surface display 
is a visual representation of a surface by the collection of lines formed when 
that surface is intersected by a set of parallel planes. The lines formed on each 
of those planes are called contours. A contour represents the set of points that 
belong to both the surface and the particular intersecting plane. Contour sur- 
face displays are used in X-ray crystallography, computer-aided tomography, 
and other applications for which grid data is collected. Contour surface display 
generation is generally depicted as a computationally slow operation whose out- 
put is sent to a plotter or film recorder. A number of papers have been written 
documenting "breakthroughs" that increase the speed of contour surface 
display generation. One author has reported that his contour surface display 
generation subroutine used one second of central processor time on NCAR’s Con- 
trol Data 7600 [Wright, 1979]. Although a contour surface display generation pro- 
gram of this speed is useful for static situations, it is found to be lacking for 
interactive applications that generate a succession of contour surface displays 
in response to contour level changes read from a control dial. 

Interactive applications that cause the generation of a succession of images 
require that the human intervention be acknowledged by a visual change to the 
current display within a finite element of time, called real-time. For a system 
that generates a new contour surface display in response to human intervention, 
real-time means that we must be able to produce and distribute a new picture in 
the amount of time it takes the graphics hardware to change display frames. 
This is typically one-thirtieth of a second. Any greater amount of time is dis- 
cernable by the viewer, either as a flicker or a hesitation in the picture update. 
In fact, one-thirtieth of a second is discernable to many people, making one- 
sixtieth cf a second a more desirable time for the change of display frames 
[Newman, 1979]. 

One application in which real-time contour surface display generation is 
important is the determination of molecular structures from the electron den- 
sity data generated by X-ray crystallography [Barry, 1979]. Such an operation is 
executed interactively by using a computer graphics program that displays a 
Dreiding (stick) model of the molecule, inside a contour surface display of the 
corresponding region of the molecule’s electron density grid. In addition to the 
graphics function, the computer program monitors a series of signals generated 
by the user, while the user is turning the various knobs on a control console 
[Zyda, 1980]. The values read from these knobs are interpreted by the program 
as modifications to either the molecule or the surface display. Modifications to 
the molecule take the form of bond rotations or bond lengthenings. 
Modifications to the contour surface display take the form of an increase or 
decrease of the contour level. The goal of this process is to produce the stick 
model of the molecule that best fits inside the given electron density data set. 
The user can determine whether or not the model fits the density grid by modi- 
fying the contour level, shrinking the contour surface to the molecule. Simi- 
larly, the user can expand the contour surface from the stick model for better 
visibility. This function requires that the hardware have the capability to rapidly 
change the contour display as its contour level changes. 

We know from [Zyda, 1934a] that the generation of a contour surface 
display, such as those required by the above application, cannot be accom- 
plished in real-time using a conventional uniprocessor. This failure is due to the 
fact that contour surface display generation algorithms require many more 





Figure 1 

Contour Surface Display Generated from a Hydrogen Atom 
Wavefunction Squared (3dxy orbital) 
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instructions executed per second than can be provided by currently available 
uniprocessors. In the past, this limitation of the conventional processor has 
relegated such applications to either the non real-time environment (waiting a 
few minutes for each display), or to the equally unsatisfying environment of 
motion picture film. Because of this, this study looks for multiprocessor solu- 
tions to the real-time contour surface display generation problem. At the 
present time, efficient multiprocessor solutions generally mean VLSI solutions. 
Consequently, the multiprocessor architectures examined in this study are 
those implementable in the VLSI technologies. 

2. Definitions and Decomposability 

A contour surface is a visual display that represents all points in a particu- 
lar region of three-space <x,y,z> which satisfy the relation f(<x,y,z>)=k, where k 
is a constant known as the contour level. The function f represents a physical 
quantity which is defined over the three-dimensional volume of interest. The 
visual display created by this algorithm is the collection of lines that belong to 
the intersection of both the set of points that satisfy the relation f(<x,y,z>)=k, 
and a set of regularly spaced parallel planes that pass through the region of 
three-space for w r hich the relation is defined. 

For this study, the function f is approximated by a discrete, three- 
dimensional grid created by sampling that function over the volume of interest. 
The three-dimensional grid contains a value at each of its defined points that 
corresponds to the physical quantity obtained from the function, i.e. the value 
associated with point ( x Q»yn» z Q) is V Q’ where f( x Q.yo ,z o} =v O* I n order to minimize 
confusion, we will specify the value at a particular grid point (x,y,z) by a(x,y,z), 
and will specify the value at a particular point (x,y,z) of the function by f(x,y,z). 

The visual display of the contour surface is created from this three- 
dimensional grid by taking two-dimensional slices of the grid, and constructing 
the two-dimensional, planar contours for each slice at the designated contour 
level. A slice of a three-dimensional grid is a planar, orthogonal, two- 
dimensional grid assigned a constant coordinate in three-space, i.e. an x-y slice 
of a(<x,y,z>) corresponds notationally to a(<x,y>) for a particular z coordinate. 
The two-dimensional, planar contours created are the lines that satisfy the rela- 
tion a(<x,y,z>)=k for a particular planar coordinate, either x, y, or z, where 
again k is the constant contour level. If we contour all x-y slices of the three- 
dimensional grid at contour level k, we will have a stack of parallel contours 
approximating the contour surface, each planar set of contours corresponding 
to a particular z coordinate. If we contour all x-z slices of the three dimensional 
grid, we again will have a stack of parallel contours approximating the contour 
surface, each planar set of contours corresponding to a particular y coordinate. 
Likewise, if we contour all y-z slices of the three-dimensional grid, we will have a 
stack of parallel contours approximating the contour surface, each planar set of 
contours corresponding to a particular x coordinate. The assemblage of the 
three sets of parallel, planar contours, i.e. the simultaneous display of all the 
contours created for the x-y, x-z, and y-z planes of the three-dimensional grid, 
produces a "chicken-wire-like" contour surface display (see Figure 1). The 
three-dimensional contour surface display described in this study is created by 
such a procedure. 

A decomposable algorithm for contour surface display generation has been 
described in [Zyda, 1984b]. That algorithm is constructed from a two- 
dimensional contouring algorithm that is used to contour all the possible planar, 
orthogonal, two-dimensional grids of a larger three-dimensional grid. The two- 
dimensional contouring algorithm of that paper is comprised of components, 
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Figure 2 

Example Contour Grid with Contours Drawn for Level 50 




Figure 3 

Example Contour Grid with Contours Drawn for Level 100 
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called algorithm components, that operate on Individual 2x2 subgrids of a 
larger two-dimensional grid. In the algorithm, the computations necessary for 
generating the contour lines for a single 2x2 subgrid are independent from 
those required for any other 2x2 subgrid. (Note: a 2 x 2 subgrid is defined to be 
that portion of the two-dimensional grid bounded by four adjacent grid points. 
In the two-dimensional grid of Figure 2, the lower, lefthand 2x2 subgrid is 
bounded by points (1,1), (2,1), (2,2), and (1,2).) If we compute the contours 
corresponding to contour level k for all 2 x 2 subgrids of a two-dimensional grid, 
then we will have determined the complete set of contours for that grid. If we 
compute the contours corresponding to contour level k for all possible 2x2 
subgrids of the larger three-dimensional grid, then we will have the complete 
contour surface display for that grid. We use this formulation for the contouring 
algorithm in this study. 

3. The Contouring Tree 

The contouring algorithm in [Zyda, 1994b] is based upon a data structure 
called the contouring tree. A contouring tree represents the edge value rela- 
tionships of a 2 x 2 subgrid in a form that permits the rapid generation of the 
contour display for any contour level contained within the represented subgrid 
(see Figure 4). The formulation of the contouring tree is based upon the obser- 
vation that for any two-dimensional grid a continuous series of contour displays 
can be created for contour levels in the range of the minimum and maximum 
grid values (see Figure 5, and [Zyda, 1984a], [Zyda, 1934b], [Zyda, 1983], 
[Zyda, 1982], [Zyda, 1981]). 

The use of the contouring tree is outlined best with an example of a small 
two-dimensional grid. Figures 2 and 3 depict the contours generated for con- 
tour levels 50 and 100. The contours at level 100 are closed contours, forming 
simple, connected loops. The contours at level 50 are open contours. Figures 4 
and 6 present the contouring trees created for two 2x2 subgrids of the 4x5 
plane. The edges of the contouring trees correspond to the directed, downhill 
edges inscribed on the 2x2 subgrids of the figures. There are eight directed 
edges on each subgrid, four for the boundary edges and four for the edges to the 
subgrid’s center point. The value used for the center point is the average of the 
four values comprising the corners of the 2x2 subgrid. (A reference as to the 
usefulness of the center point average value in generating smooth contours is 
found in [Sutcliffe, 1980].) The edges of the contouring trees are ordered, main- 
taining the same counterclockwise ordering as in the original subgrids. A "l" 
under a node indicates that a setpoint display command should be generated for 
any coordinate that is created along an edge that has that connectivity on its 
lower valued node. A "0” indicates a drawto display command in a similar 
fashion and a "2” indicates a drawpoint. 

Display generation from a contouring tree is accomplished by performing a 
pre-order traversal of that contouring tree, producing a coordinate and drawing 
instruction whenever the desired contour level is found to be within the range of 
an edge of the contouring tree. A pre-order traversal visits the root, the left 
subtree, the middle subtree, and then the right subtree. An edge’s range is 
defined to be the set of values between those associated with the nodes on either 
end of the edge. More precisely, we say a contour Level is within an edge if the 
following condition holds: 

lower_node’s_yalue <_contour_level < higher _node’s_yalue 

For example, in Figure 4a at contour level 100, we issue coordinates and drawing 
instructions for the edges (2,2)-(3,2), (2,2)-(2.5,2.5), and (2,2)-(2,3). The drawing 
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SAMPLE CONTOURING TREE FOR A 2 X 2 



SUBGRI 0 



Level 50 
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2.9091 


2.0000 


1.0000 


2.8333 


2.1667 


1.0000 


3.0000 


2.5000 


1.0000 


2.6667 


3.0000 


1.0000 


2.2500 


2.7500 


1.0000 


2.0000 


2.8333 


1.0000 


Level 100 


X 


Y 


Z 


2.4545 


2.0000 


1.0000 


2.3125 


2.3125 


1.0000 


2.0000 


2.4167 


1.0000 



Column D is the drawing command, ie. 1 = SETPOINT, 0 = DRAViTO. 



Figure 4b 

Coordinates Generated for Sample 2x2 Subgrid 



o o o o 




Figure 5 

Example Contour Grid with Contours Drawn for Multiple Contour Levels 
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SAMPLE CONTOURING TREE FOR A 2 X 2 SUBGRID WITH SADDLE POINT 



Tree rooted at value 90 



Level 50 
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3.0000 


1.8000 


1.0000 


2.8824 


1.8824 


1.0000 


2.0000 


1.0000 


1.0000 


2.0000 


1.0000 


1.0000 


Level 100 






X 


Y 


Z 



no coordinates generated 



Tree rooted at value 150 



Level 50 


X 


Y 
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2.0000 


1.0000 


1.0000 


2.0000 


1.0000 


1.0000 


2.8324 


1.8824 


1.0000 


2.9091 


2.0000 


1.0000 


Level 100 


X 


Y 


Z 


2.0000 


1.5000 


1.0000 


2.3704 


1.6296 


1.0000 


2.4545 


2.0000 


1.0000 


Column D is the drawing 


command, ie. 1 


L = SETPOINT. 0 



D 

1 

0 

1 

0 



D 



D 

1 

0 

1 

0 



D 

1 

0 

0 

= DRAWTO. 



Figure Gb 

Coordinates Generated for Sample 2x2 Subgrid with Saddle Point 
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instruction issued for each of these edges is again the one associated with the 
lower valued node of the edge. The coordinate for each of these edges is gen- 
erated by a linear interpolation of the edge’s endpoint coordinates according to 
the decrease in contour level along the edge. The coordinates and drawing 
instructions generated for the contouring trees of Figures 4a and 6a are 
represented in Figures 4b and 6b. 

There are some subtleties not evident from the above that are best detailed 
using a pseudocode description of the traversal algorithm. Figure 7 depicts the 
traversal procedure for the contouring tree assuming a particular data organi- 
zation. The notation is quite standard. The pointers to the descendent nodes of 
NODE are LEFT(NODE), MIDDLE(NODE). and RIGHT(NODE). For each node of the 
contouring tree, there are three pieces of information: the value associated with 
the node, VALUE(NODE), the coordinate associated with the node, XYZ(NODE), 
and the connectivity associated with the node, CONN(NODE). 

The generation of coordinates and drawing instructions from a contouring 
tree begins with routine CONTOUR_SUBGRID of Figure 7. That routine receives a 
pointer to the root node of the contouring tree. It then starts the traversal by 
calling routine VISIT with that root node. Routine VISIT checks to see if the edge 
defined by the passed in node and that node’s ancestor, NODE and ANCESTOR, 
contains the contour level. If the edge does contain the contour level, the edge 
intersection coordinate is computed using linear interpolation and issued to the 
display along with the connectivity associated with that node, CONN(NODE). If 
we issue a coordinate and connectivity for a node, we need to check the subtree 
under that node for equivalued edges. If an equivalued edge at the contour level 
is found, a coordinate and drawing instruction pair are issued for that 
equivalued edge (routine VISIT_SUBTREE). Once a coordinate and drawing 
instruction pair have been issued for an edge, and once the subtree beneath 
that edge has been investigated for equivalued edges, further traversal of that 
subtree is terminated. If an edge is found not to contain the contour level, the 
traversal continues as depicted at the bottom of routine VISIT. 

The pre-order traversal procedure described generates the coordinates and 
drawing instructions for the part of the 2x2 subgrid the contouring tree 
represents. To generate the coordinates for a larger two-dimensional grid, we 
generate the contouring trees for each 2x2 subgrid of that grid, and then apply 
the traversal procedure to those trees. IVe note here that no ordering is 
required in the generation of coordinates for the 2x2 subgrids. The coordinate 
and drawing instruction set generated for each 2x2 subgrid is complete and 
independent of the picture generated for any neighboring 2x2. 

3. 1. Contouring Tree Use Discussion 

Having presented the use of the contouring tree, we must discuss its limita- 
tions. The initial impression is that the contouring tree provides a nice, uniform 
framework for generating the coordinates and drawing instructions appropriate 
to the 2x2 subgrid. This is close to correct but there are problems. These 
problems all concern issues of picture efficiency. Since the display generated 
for each 2x2 subgrid is generated independently of any neighboring 2x2 
subgrids, equivalued lines at the contour level on the border of a subgrid will be 
duplicated. A similar problem occurs for subgrid corner values that equal the 
contour level. If we display either of the above cases on a calligraphic display 
device, we will see a bright line for the equivalued edge, and a bright point for 
the grid value equal to the contour level. Another problem, also due to the 
independent computation of each 2x2 subgrid, is that no ordering is provided 
for coordinates that come out of this algorithm. For calligraphic displays, this is 



Contouring Tree Description 
Pointers to descendenl nodes: 



LEFT(NODE) 

MIDDLE(NODE) 

RJGHT(NODE) 



Values associated with each node: 

VALUE(NODE): grid value 

XYZ(NODE) : coordinate of that grid value. 

CONN(NODE) : drawing instruction. 



Procedure CO NTOUR_SUB GRID (ROOT) 

VIS1T(R00T,R00T) ft begin the traversal of the pointed at 
ft contouring tree. 



end. 



Procedure VISIT(KODE.ANCESTOR) 
if(NODE == NULL) 

l 

return 



if((VALUE(\'ODE) <= CONTOUR _LEVEL < VALUE( ANCESTOR)) 

OR 

(VALUE(NODE) == CONTOUR_LEVEL AND NODE == ANCESTOR)) 

* 

// Edge contains the contour level. 

Issue a coordinate computed via linear interpolation 
along the edge. 

Issue CONN(NODE) as the drawing instruction. 



Figure 7 

Pseudocode of the Traversal Algorithm for the Contouring Tree 



ft Check subtrees of this node for equivalued edges. 
VISIT_SUBTREE(LEFT(NODE),NODE) 
VISIT_SUBTREE(MJDDLE(NODE).NODE) 
VISIT_SUBTREE(RIGHT(NODE),NODE) 

return ft no need to examine the subtree further. 

} ft endif coordinates were generated for an edge. 



VlSir(LEFT(NODE).NODE) ft visit, left subtree. 
VISIT(MIDDLE(NODE),NODE) ft visit middle subtree. 
VISIT(RIGHT(NODE),NODE) ft visit right subtree. 

return 

end 



Procedure VISIT_SUBTREE(SUBNODE, SUBANCESTOR) 
if(SUBNODE == NULL) 
return 

{ 

Lf(VALUE(SUBNODE) == CONTOUR_LEVEL) 

S 



Issue coordinates for the equivalued edge. 
Setpoint on XYZ(SUBANCESTOR). 

Drawto XYZ(SUBNODE). 



J 



VlSlT_SUBTREE(I£fT(SUBNODE), SUBNODE) 
V1SIT_SUBTREE(MIDDLE(SUBN0DE), SUBNODE) 
V1SIT_SUBTREE(RIGHT(SURN0DE), SUBNODE) 

return 



end 



Figure 7 (continued) 

Pseudocode of the Traversal Algorithm for the Contouring Tree 
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a problem because for such devices electron beam movement is expensive. A 
contour display that causes the maximum movement of the electron beam every 
other subgrid greatly decreases the the vector capability of the calligraphic 
display device. 

There are three possible solutions to the first problem, that of duplicate 
vectors. The easiest solution is to choose an output display device for which 
such picture inefficiencies do not matter, i.e. a raster display. Vector ordering 
is also eliminated as a problem with this solution. The second solution to the 
vector duplication problem is to set aside points and lines at the contour level 
that correspond to subgrid boundaries. A final pass at the end of the computa- 
tion for a complete two-dimensional plane could readily cull the duplicates. This 
second solution does nothing for the vector ordering problem. This solution also 
requires a join operation on the results of the algorithm component computa- 
tions for each two-dimensional grid, and consequently, diminishes the 
algorithm’s concurrency potential. The third solution, and the most expensive 
of the three, is to merge the set of trees generated for the two-dimensional grid 
such that duplicate edges in separate trees are eliminated. This solution has 
the added benefit that the resultant contours are generated in an order that 
solves the beam movement problem. This solution is not described in detail 
here and the reader is referred to [Zyda, 1991] for further detail. For this study, 
the first and simplest solution is assumed for purposes of maximizing the con- 
currency potential of the algorithm. Consequently, the expected output display 
device is the raster display. 

3. 2. Contouring Algorithm Simplifications 

Before we look in detail at a special architecture for computing the contour 
lines for a 2 x 2 subgrid, we first consider simplifications to that algorithm that 
greatly increase its speed. The first simplification we consider is one that elim- 
inates contouring tree construction for the 2x2 subgrid. In [Zyda, 1984a], a 
procedure for contouring tree construction is described. That procedure begins 
with the composition of a 5 x 5 adjacency matrix that represents the directed 
graph of the edges inscribed on the four grid points and center average value 
point of the 2x2 subgrid. Using a 5 x 5 adjacency matrix to describe a graph 
that has a constant set of eight edges, whose only changes are in the directions 
cf those edges, is quite expensive. We can replace that adjacency matrix by a 
field of eight bits, with a one indicating one direction and a zero the other. This 
replacement makes quite clear the fact that there are really only 256 possible 
configurations of contouring trees. If we remember that the center point is not 
ever chosen as maxima, and that subgrid digraphs without maxima have no con- 
touring trees, this reduces the total to 120 possible configurations of contouring 
trees [Zyda, 1984a]. With these simplifications, we can look up the tree 
configuration for a 2x2 subgrid from a small table once we have its 
configuration number. The configuration number is composed by an assignment 
of edges and directions to each bit of the eight bit number (see Figure 8). 

The second simplification we consider is one that speeds up the use of the 
contouring tree in its generation of the contours. The time consuming portion 
of this process is the traversal of the contouring tree. One speed up is to pre- 
compute the tree traversals for each contouring tree by forming a linear list of 
each tree’s edges in traversal order. The data necessary for the contouring 
trees represented in this form for the example trees of Figures 4 and 6 can be 
seen in Figures 9 and 10. The traversal is accomplished by stepping through the 
linear list of edges using the same edge evaluation scheme as described previ- 
ously, i.e. a contour level is within an edge (and hence a coordinate should be 
generated) if: 




( 1 ) ( 2 ) 

A ONE IN THE SIT POSITION MEANS THE EDGE EXISTS. 

A ZERO IN THE BIT P 05 1 T 1 OM MEANS THE EDGE OF OPPOSITE 

01 RECTI ON EXI STS. 



FI SURE B 

CONFIGURATION NUMBER EDGE ASSIGNMENTS FOR THE 2 X 2 SU8CRI D 
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Configuration Number = 21 = 0001 0101 
Tree Number 1 has 9 edges. 
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Figure 9 

Traversal List Representation of the Contouring Tree of Figure 4 
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Tree Number 1 has 6 edges. 
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Tree Number 2 has 6 edges. 
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Figure 10 

Traversal List Representation of the Contouring Tree of Figure 6 
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current_node’s__value <Lcontourjevel < previous_node’s_yalue 

If a coordinate is generated for an edge, the subtree delineated by the 
"next edge" field of the table is examined for. equivalued edges at the contour 
level. If such equivalued edges are encountered, coordinates and drawing 
instructions appropriate to that edge are generated. Note that the traversal list 
tables of Figures 9 and 10 are in terms of the subgrid numbering scheme rather 
than in terms of explicit grid values. In the design of the architecture for the 
contour surface display generator, we use the configuration number to find the 
traversal list for the contouring tree, and use that traversal list to generate the 
display coordinates. This is instead of actually constructing and traversing the 
contouring tree. 

4. Architectural Modelin g 

The architectural modeling necessary to determine if a VLSI multiprocessor 
for real-time contour surface display generation is feasible is accomplished in 
two steps. The first step is the modeling of the algorithm component level (see 
Figure 11). The purpose of this step is to determine if the amount of code 
specified for the algorithm component computation is executable in real-time. 
In this step, an implementation of the algorithm component is analyzed. The 
analysis is performed in the context of a processor whose characteristics are 
similar to those of a general purpose microprocessor, the MC68000. The model 
constructed is a register transfer model of the algorithm component. In this 
model, the memory references that are made for each instruction’s operation 
and for each operand’s retrieval during the execution of the algorithm com- 
ponent are counted and recorded. Since the number of memory references a 
program makes is proportional to its run time, we only have to multiply by the 
amount of time a memory reference requires in order to obtain a measure of 
the real-time capability of the algorithm component processor ([Zyda, 1981], 
[Zyda, 1982], [Zyda, 1933], [Zyda, 1984a], [Aho.1974], and [Fuller, 1977]). The 
value used in this study is 250 nsec per memory reference. This value is the 
slowest access time indicated for dynamic RAM (DRAM), and ROM chips 
announced over the last year in the IEEE journals Computer , and Micro (see Fig- 
ure 12). Since there are access times indicated that are less than half that 
value, i.e. 70 nsec, we are conservative in the choice of 250 nsec as the time 
required to complete a memory reference. 

The second step in the architectural feasibility modeling is the modeling of 
the total system of algorithm component processors (see Figure 13). The pur- 
pose of this step is to determine the total number of processors we can use in 
parallel, the load (number of algorithm components) per processor, and the 
total real-time capability of that system, i.e. the size of the largest three- 
dimensional grid for which we are able to generate the contour surface display 
in real-time. This part of the modeling effort extends the algorithm component 
modeling results to a model of the total system architecture for the real-time 
contour surface display generator. With the structure and real-time capability 
of the algorithm component processor established, we determine the capabili- 
ties of a system utilizing multiple copies of that processor. The parameters of 
the complete system modeled are derived from the requirements of the applica- 
tions. The parameters utilized include such factors as the total size of the 
inputs and outputs, and the total number of algorithm components (and hence, 
the total number of algorithm component processors). 
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4. 1. Architecture for the Algori thm Component 

We begin the description of the architecture for the algorithm component 
with an overview diagram (see Figure 14). In that figure the important architec- 
tural pieces of the processor and their interconnections are depicted. The 
pieces shown are found in most processors. The important topics for our discus- 
sion are (l) the use of the hardware in the implementation of the algorithm 
component, and (2) the sizes of the hardware elements depicted in the figure. 

In order to detail the sizes of the hardware elements in the figure, we first 
describe the operations expected of the algorithm component processor. There 
are only four: (1) reset the entire system of algorithm component processors, 

(2) accept a 2 x 2 subgrid description into a particular algorithm component 
processor, (3) place the coordinates generated for a particular 2x2 subgrid 
onto the system bus, and (4) generate the contours for the 2x2 subgrid held in 
the algorithm component processor. The first operation, the reset operation for 
the entire system of algorithm component processors, is clearly required. Com- 
puting systems are never constructed without some mechanism for providing a 
known initial state of the hardware. 

The second operation, that of accepting a subgrid definition into a particu- 
lar algorithm component processor, has implications for both the size of the 
RAM of the processor, and for the performance of its external communication 
mechanism. For that operation, the algorithm component processor needs to 
be able to recognize when a subgrid definition is addressed to it, and then needs 
to be able to store that information into its RAM. For both parts of this opera- 
tion, we need to evaluate the size of the input to the algorithm component pro- 
cessor. This is accomplished by making a short list of the data input for a single 
algorithm component: 



(1) 4 quantities for the grid values on the corners 
of the 2x2 subgrid (16 bytes) 

(2) 2 values representing the lower lefthand coordinate 
of the 2x2 subgrid (2 bytes) 

(3) 2 values representing the orthogonal coordinate and 
the orthogonal coordinate type (2 bytes) 

(4) 1 value for the contour level (4 bytes) 

If we assume 32-bit transfers to the algorithm component processor, this is a 
total of 6 references per 2x2 for the input operation, requiring an equivalent 
amount of RAM storage. 

The third operation, that of placing the coordinates generated in a particu- 
lar algorithm component processor onto the system bus, has implications simi- 
lar to that of the input operation. For the output operation, the algorithm com- 
ponent processor needs to be able to recognize when it should deposit its coor- 
dinates onto the system bus, and needs to be able to provide RAM storage for 
those output coordinates beforehand. From [Zyda, 1984a], we know that the 
largest output that can be generated for a 2 x 2 subgrid is 6 coordinate and 
drawing instruction quadruples (78 bytes). If we count the byte indicating the 
number of coordinates output, we need to perform 20 32-bit transfers for the 
output operation, and need to provide an equivalent amount of RAM storage. 
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The fourth operation, that of generating the contours for the 2x2 whose 
definition is held in the algorithm component processor, effects the size of all 
the memories in the algorithm component processor. If we use the algorithm 
simplifications described above, this means we need to provide space for the 
tree traversal list tables (2681 bytes), the algorithm component miscellaneous 
variables (45 bytes), and the code that performs the algorithm component com- 
putation (3080 bytes). (A comprehensive listing of all the data required in the 
algorithm component computation can be found in [Zyda, 1984a], Figure 3.1.). 
The estimates for the input, output, tree traversal tables, and miscellaneous are 
derived directly from the data and data sizes required for the computation of 
the algorithm component. All the data sizes are rounded to the nearest byte, 
except for the large tree traversal tables where estimates are quoted in terms of 
the number of bits needed. The bit-wise specifications for the traversal tables 
are combined and then divided by the number needed to form a total specifiable 
in bytes. 

The estimate for the size of the code required in the algorithm component 
processor is computed by totaling the number of instructions used in the four 
routines that comprise the register transfer model of that algorithm com- 
ponent. A value of four bytes per instruction is assumed. The values obtained 
for each of the four modeled routines are (l) 792 bytes for the control program 
of the contouring operation, (2) 500 bytes for computing the subgrid coordi- 
nates and average value point, (3) 304 bytes for computing the contouring tree 
configuration number, and (4) 1484 bytes for the traversal list usage and coordi- 
nate generation routine. 

Combining the data and code totals, the algorithm component processor is 
seen to require 5909 bytes of storage, 148 bytes for input, output, miscellane- 
ous, and temporaries (read/write memories), and 5761 bytes for the code and 
tree traversal lists (read-only memories). In our computation of the size of the 
algorithm component processor, the above values represent the space needed 
for registers, random-access and read-only memories. Space estimates for the 
rest of the hardware are not included. In order to provide a size value for the 
remainder of the architectural features in the algorithm component processor, 
we need to enumerate those hardware requirements. 

The control portion of the algorithm component processor is shown in the 
right half of Figure 14. It is composed of the external instruction register, the 
microprogram logic, the decoder, and the microcode ROM. There is nothing spe- 
cial expected for this control section that is not standard among most proces- 
sors. The only important feature is the relatively large microcode ROM that con- 
tains the actual contouring program. Above, w'e stated that this ROM required a 
minimum of 3080 bytes in order to be able to perform the expected operations. 
Rounding this to a power of two, and assuming horizontal microprogramming 
for the algorithm component processor, a 1024 by 32-bit memory is the esti- 
mate that is used in our VLSI feasibility determination. 

Continuing with the topic of rounding the memory sizes and 'widths of the 
ROMs and RAMs specified on Figure 14, we find that the tree traversal ROM, origi- 
nally specified as requiring a minimum of 2681 bytes, is best configured as 2048 
by 16 bits. The reason for this large increase in the space requirement for the 
tree traversal tables is that the edge entries are expanded to 16 bits rather than 
the original 12 bits as specified above. The RAM of Figure 14, used to hold the 
subgrid definition, the coordinates generated, and any temporaries, is assumed 
to be 64 by 32-bits, up from the originally specified minimum of 148 bytes. VTe 
should note at this point that the ROMs and RAMs specified are expected to con- 
sume the majority of the area on the VLSI chip. 
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The ALU and the register block of Figure 14 are the remaining items for 
which we must develop a size estimate. The register block has no special 
requirements other than that there be about eight 32-bit registers. This is not a 
measured requirement but rather one suggested by the designs of other 
microprocessors. The ALU shown in Figure 14 is dealt with in the same way as 
the rest of the hardware in that we assume it too is little different than ALUs 
found in currently produced microprocessors. This means it has the capability 
to perform integer addition, integer subtraction, integer division, and integer 
multiplication. It would be nice to have floating point operations directly in the 
ALU but this is expensive and consumes considerable area on the VLSI chip. Any 
floating point operations we need to perform can be simulated using the integer 
arithmetic capabilities provided by this minimal ALU. It should be noted that 
the algorithm component of the contour surface display generation algorithm 
was originally implemented entirely with integer arithmetic. 

4.2. Real-Time Capability of the Algorithm Component Processor 

In order to determine if the amount of computation specified for the algo- 
rithm component is executable in real-time, one-thirtieth of a second, we need 
to put together a register transfer model of that algorithm and then to execute 
that model with the w r orst case inputs for the algorithm. As indicated above, a 
register transfer model counts the total number of memory references made by 
the algorithm component for both operation executions, and operand retrievals. 
There are four parts to the register transfer model of the algorithm component: 
(1) the input of the 2x2 subgrid to the algorithm component processor, (2) the 
output from the algorithm component processor, (3) the tree construction 
(traversal list indexing), and (4) the contour generation (traversal list usage). 
The memory reference count for each of these parts of the algorithm com- 
ponent needs to be modeled and totaled in order to determine the feasibility of 
executing in real-time a complete, worst-case set of input data. 

The first part of the register transfer model is the number of memory refer- 
ences required to complete the input of the 2x2 subgrid to the algorithm com- 
ponent processor. The total number of 32-bit transfers for this operation was 
obtained in the previous section — 6 32-bit transfers per 2x2 subgrid. The 
second part of the register transfer model, the number of memory references 
required to complete the maximum sized output, was also obtained in the previ- 
ous section - 20 32-bit transfers per 2x2 subgrid. The third part of the regis- 
ter transfer model, the tree construction (traversal list indexing), requires 
602 32-bit references to (1) compute the center average value point from the 
four subgrid points (263 references), (2) determine if the points are in range of 
the current contour level (177 references), and (3) compute the configuration 
number (162 references). The fourth part of the register transfer model, the 
contour generation (traversal list usage), requires a maximum of 2048 32-bit 
references. It should be noted that this maximum is obtained for the subgrid 
that generates the maximum number of coordinate and drawing instruction qua- 
druples, six quadruples per 2x2. For typical applications, the average number 
of coordinate and drawing instruction quadruples generated for the set of 2 x 2 
subgrids that generate coordinates at all is 2.54 quadruples per 2x2. This value 
was obtained empirically through the monitoring of the execution of the con- 
touring algorithm on several data sets typical of the expected applications. 
Though the use of this average number of coordinates could significantly lessen 
the number of memory references found for the contour generation (traversal 
list use) part of the register transfer model, the worst case of six quadruples, 
corresponding to 2048 memory references, must be used in the determination 
of the real-time capability of the algorithm component processor. The worst 
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case must be used because that case indicates the longest time the system of 
algorithm component processors will require for the completion of the contour- 
ing operation. 

Once we have obtained the memory reference count for all four parts of the 
register transfer model of the algorithm component, we can total the memory 
references and determine if that component can be executed in real-time. For 
the register transfer model of the algorithm component, a total of 2676 memory 
references are required for (l) the input to the algorithm component processor 
(6 memory references), (2) the output from the algorithm component processor 
(20 memory references), (3) the tree construction, or traversal list indexing, for 
the 2x2 subgrid (602 memory references), and (4) the contour generation from 
the trees generated, or traversal lists indexed, (2048 memory references). At 
250 nsec per reference, this is about 669 microseconds -- clearly under the one- 
thirtieth of a second (33,333 microseconds) goal we set for the algorithm com- 
ponent processor. In fact, given one-thirtieth of a second, we can accomplish 
about 50 algorithm component computations in serial. Now that we have esta- 
blished the feasibility of computing the algorithm component in real-time with 
the architecture proposed, we need to design a larger system of multiple algo- 
rithm component processors. 

5. Larger System of Multiple Algorithm Component Processors 

The first issue of importance that must be covered when considering the 
design of the larger system is the issue of how operations and data are commun- 
icated. Figure 15 contains a view of the proposed interconnection scheme for 
the algorithm component processors. In that figure, each processor is depicted 
as being connected to a system bus, and a serial control line called the count- 
enable line. As indicated in Figure 14, the system bus provides both data and 
instructions to the algorithm component processor. It also provides the path- 
way for data output back to the display controller. Not so clear in that figure is 
the function of the count-enable line. The count-enable line is a one bit control 
line that runs in a daisy-chain fashion from one algorithm component processor 
to the next. Its function is to provide a processor addressed capability for 
operations indicated to the larger system of processors. Its effect is to serialize 
the execution of processor addressed operations such as data input and output. 
This is accomplished in the following manner. Each algorithm component pro- 
cessor uses the logical OR of the global control line contained in the system bus 
and the count-enable line to determine if it should gate in the instruction 
currently presented on the system bus. A signal on the global control line indi- 
cates a global operation, and means that all processors of the system should 
perform the specified operation. Global operations are used to initiate the 
highly parallel computations of the algorithm component. A signal on the 
count-enable in line for an algorithm component processor indicates a proces- 
sor addressed operation, and means that the instruction and any following data 
on the system bus are addressed to that specific processor. Once an algorithm 
component processor has gated in a processor addressed instruction and its 
data, it then sets the count-enable out line high. The setting of the count-enable 
out line to high indicates to the next processor in the chain that it should gate in 
the instruction and data next on the system bus. The count-enable mechanism 
is used to propagate processor addressed instructions throughout the system in 
an orderly fashion. Its effect is to serialize the execution of operations such as 
the input of data to and the output of data from each algorithm component pro- 
cessor. 

It should be noted at this point that other processor interconnection 
schemes such as multiple buses for parallel data output have not been 
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considered in this study. The reason for this limitation is that the currently 
available display devices to which the output is directed, only have a single, 8 to 
32 bit wide pathway for display list modification. The design of a display device 
with multiple, parallel pathways for display list modification is outside the scope 
of this study. 

In order to complete our description of the communication mechanism for 
the system of multiple algorithm component processors, we need to estimate 
the widths of (l) the system bus data and control lines, (2) the count-enable 
lines, (3) the external instruction register, and (4) the external data register. 
The system bus and count-enable lines sizes are the most important because 
they extend across VLSI chip boundaries, and hence require package pins. The 
count-enable lines require two bits, one into and one out of each algorithm com- 
ponent processor. This requires two pins on the VLSI chip. The system bus 
specification is more difficult in that we have both data and control line widths 
to specify. The width of the data portion of the system bus is chosen to be 32 
bits. This figure is based upon the number of pins we expect to be able to spare 
on the VLSI chip, and upon the fact that we assume a 32-bit processor, and 32- 
bit transfers in our register transfer models. In order to determine the width of 
the control line portion of the system bus, we need to compose a list of the sig- 
nals we expect it to carry: 

(1) global/processor addressed bit (l bit) 

(2) instruction bits (3 bits) 

(3) data transfer control lines (6 bits) 

(4) miscellaneous control lines (6 bits) 

The sizes indicated for the data transfer and miscellaneous control lines are 
taken from the bus designs for similarly sized processors and are not exact 
[Kayes, 1978]. The values quoted only serve as an estimate on the number of 
control signals expected. Consequently, the total estimate for the control por- 
tion of the system bus is 16 bits for a bus total of 43 bits. Adding the two pins 
for the count-enable lines, this means a minimum of 50 pins on the VLSI chip. 
This is somewhat under the current package limit of 64 pins, and allows room for 
additional pin requirements. 

The sizes of the external data register and the external instruction register 
are set by the data width assignments made for the system bus. The instruction 
portion of the system bus was set at three bits based upon the fact that there 
are only four operations we expect to signal to the algorithm component proces- 
sor. Consequently the external instruction register only needs three bits. The 
purpose of the external instruction register is to hold a signaled instruction 
until the control portion of the algorithm component processor is finished with 
its previous operation and ready to execute a new one. 

The external data register is used to transfer data to/from the algorithm 
component processor from/to the data portion of the system bus. The data that 
is transfered into the algorithm component processor is data such as the 
subgrid definition and the new contour level. The data transfered out of the 
algorithm component processor is the set of coordinate and drawing instruction 
quadruples generated by the last execution of the generate contour instruction. 
Since the data width portion of the system bus is set at 32 bits, the external 
data register is also 32 bits. The initiation of data transfers through the exter- 
nal data register is carried out by the control section of the algorithm 
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component processor. 

5.1. Modeling the Larger System of Algorithm Component Processors 

The purpose of the model for the larger system of algorithm component 
processors is to answer the question of exactly how many algorithm component 
computations can be executed in parallel in one-thirtieth of a second, with the 
only limitation being that the coordinates and drawing instructions must be 
delivered within that same time period. For this model, we assume an infinite 
capability for processors. We also assume that to obtain the highest processor 
utilization, the individual processor may be responsible for multiple, serial algo- 
rithm component computations. The timing values for this step are obtained by 
extending the register transfer model developed for the algorithm component 
processor. 

In order to determine the number of maximal algorithm component compu- 
tations we can execute in parallel, we compose a model of that system: 

Real-Time = Input Time + Computation Time + Output Time 
Available 



The model forms a simple linear equation, with the real-time available on one 
side and the input, output, and computation times on the other. For this model, 
we make tfe following assumptions: (l) the amount of real-time available is 
33.333 x 1 (j seconds, (2) all of the algorithm component computations occur in 
parallel, so only one maximal computation is added to the model’s equation 
(2650 references @250 nsec/reference), (3) the only input is the single 32 bit 
new contour level, distributed to all processors via a global command (1 refer- 
ence @250 nsec/reference), (4) the size of the output from each algorithm com- 
ponent computation is of average size (2.54 coordinates and drawing instruction 
quadruples, or 9 references, for each 2x2 subgrid that generates coordinates 
[Zyda,19B4a]). The model has the following equation: 



The variable X stands for the maximum number of algorithm component compu- 
tations that the modeled system can handle. Solving for X, we find that we can 
compute in parallel, in one-thirtieth of a second, 14,520 algorithm component 
computations, generating a total of 36,880 coordinate and drawing instruction 
quadruples. Again, this requires some 14,520 processors, each operating in 
parallel. 

6. Further Applications Details 

Once we have an idea of approximately how many algorithm component 
computations we can perform in one-thirtieth of a second, we then need to 
further examine the particular real-time application in order to determine if we 
are able to handle the expected maximum input data grid. Using the molecular 
modeling program presented above as the typical application, we find that the 
largest three-dimensional grid of interest is a cube of 30 units on each side 
[Barry, 1979]. As discussed, a contour surface display is created for a three- 
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dimensional grid by generating the coordinates and drawing instructions for all 
possible orthogonal two-dimensional grids of that larger grid. For the 
30 x 30 x 30 grid, this is 90 30 x 30 grids. Specifying this in total 2x2 subgrids, 
this is 75,690 2 x 2s that must be computed in one-thirtieth of a second. From 
our architectural discussion, we found that we have the capability for generating 
coordinates from 14,520 2x2 subgrids in one-thirtieth of a second. Given that 
this is considerably under the total number of 2 x 2s, there are several questions 
for which we must provide answers: 

(1) For the applications of interest, what is the maximum number of 2 x 2s (of 
the 75,690 total) for which we expect to generate coordinates and drawing 
instructions? 

(2) What is the maximum number of coordinates we expect to generate for those 
applications? 

(3) How do we hem die 2 x 2s that do not generate coordinates? 

(a) Do we send the 2x2 subgrids to the algorithm component proces- 
sors each time a new contour level is set, eliminating non-productive 2 
x 2s at a higher level? 

(b) or do we double up the processors we can handle with 2 x 2s of 
non-overlapping grid value ranges? 

The first and second questions are related so we answer them by referring 
to studies of the applications of contour surface display generation. For those 
applications, we see that the maximum observed percentage of 2 x 2s that gen- 
erate coordinates is 13 percent, or around 9900 2 x 2s. The number of coordi- 
nates generated for that system, the maximum number for our applications pur- 
poses, is 25,150 coordinate and drawing instruction quadruples. Clearly this is 
within the capabilities shown above for the system of algorithm component pro- 
cessors. 

The third question, that of how we handle 2 x 2s that do not generate coor- 
dinates, is more difficult to answer. One possibility, as indicated in 3a, is to 
eliminate non-productive 2 x 2s at a higher level, sending only the coordinate 
productive ones to the algorithm component processors each time a new con- 
tour level is indicated. If we model this situation in a manner similar to that 
shown above, and assume an average number of coordinates generated for each 
2x2, we find that the system can handle a maximum of 8712 2 x 2s in one- 
thirtieth of a second, not counting the time required for filtering out the non- 
productive 2 x 2s. This is not large enough to handle the maximal problem of 
9900 2 x 2s computed in parallel though it is not a bad solution. The only prob- 
lem with this solution is that it requires a higher level mechanism of some intel- 
ligence. We prefer to place all of the operations required for contour surface 
display generation into the multiprocessor system. If we were to build the mul- 
ticomputer based upon this, it would require 8712 algorithm component proces- 
sors of the type described above. 

The second possibility, that mentioned in 3b, is to double up the algorithm 
component processors with 2 x 2s of non-overlapping grid point value range. 
Non-overlapping 2 x 2s never generate coordinates for the same contour level. If 
we keep track of the ranges for each 2x2, and the processor range in each 
algorithm component processor, we have a method for examining and comput- 
ing coordinates for all 75,690 2 x 2s in roughly the same amount of time it takes 
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to perform the calculation on only those 2 x 2s that generate coordinates. The 
only question is can we find enough non-overlapping 2 x 2s in the typical prob- 
lem to allow this solution? The answer is certainly we can. From studies of the 
value ranges of the grids we expect to encounter, we find that for a system of 
75,690 2 x 2s the maximum number of non-overlapping partitions is about 
16,000. This is an average of five 2x2 subgrids per partition, with an observed 
maximum of fifteen 2x2 subgrids in a single partition. Extrapolating these 
figures to the architecture, we find a requirement of 16,000 algorithm com- 
ponent processors, with a storage capacity of 15 2 x 2 subgrids in each proces- 
sor. 

The above has discussed one architecture for real-time contour surface 
display generation. The goal that guided the design of that architecture was the 
use of all of the parallelism available from the decomposition of the complete 
algorithm. There are clearly alternate architectures, not all of which can be dis- 
cussed in this study. One such architecture is suggested by our original note, in 
the discussion of the real-time capability of the algorithm component processor, 
that each algorithm component processor could accomplish about 50 algorithm 
component computations in serial in one-thirtieth of a second. Before we can 
close the discussion of architecture for the contour surface display generator, 
we must consider a system of multiple algorithha component computations 
being performed in serial by a single algorithm component processor. 

The model for such a system is easily composed from the data computed 
and derived for the highly parallel system. We will skip the preliminary con- 
siderations and model the system with the following assumptions. The input 
subgrids are already loaded into each algorithm component processor. The out- 
put from the total system of algorithm component processors is of average size, 
Le. 2.54 coordinate and drawing instruction quadruples are generated from 9900 
2x2 subgrids, for a total of 25,146 quadruples, or 89,100 memory references. 
The output is 32 bits wide, again due to the design of the display processor. In 
one-thirtieth of a second, there are 133,333 memory references using the figure 
of 250 nsec per memory reference. Subtracting the total number of memory 
references required for the output from the total number of memory references 
in one-thirtieth of a second, we find that 44,233 memory references are available 
for the computation of multiple subgrids in a single algorithm component pro- 
cessor. Dividing the total available computation time by the maximum amount 
of time an algorithm component processor could spend on a single algorithm 
component computation, 2650 memory references, we find that each algorithm 
component processor can compute the display for 16 subgrids in serial, with the 
system still being able to deliver the output in real-time. Dividing the total 
number of subgrids considered for our applications, 75,690 subgrids, by 16, we 
find that we need 4731 algorithm component processors. 

Referring back to the discussion of our ability to coalesce the 75,690 
subgrids into 16,000 partitions, each partition containing a maximum of 15 
subgrids of non-overlapping grid values, we find that we really only have a 
requirement for 16,000 subgrid computations. If we design each algorithm com- 
ponent processor to hold 16 of these partitions, Le. each processor has the 
capability for 15 times 16 subgrids, then we really only need 1000 processors. 
The only differences from the algorithm component processor previously 
described are (l) a larger RAM for the extra subgrid definitions, (2) a larger 
microcode ROM for the value range acceptance mechanism, and (3) a wider 
instruction portion of the system bus. The additional memory requirements are 
shown in Figure 16. 




FIGURE 15 

BLOCK DIAGRAM OF THE FINAL ALGORITHM COMPONENT PROCESSOR 
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7. VLSI Feasibility for the Contour Surface Display Generator 

The above discussion has left us with an outline of the architecture neces- 
sary for real-time contour surface display generation. An important factor to 
consider at this point is the actual feasibility of implementing such a system in 
the VLSI technology. For this feasibility determination, we need to compute a 
value for the hardware complexity. The chief components of this complexity are 
the total number of transistors required, and the total number of VLSI chips. 
Once these values are obtained, we can then make a statement as to the feasibil- 
ity of actually constructing the real-time contour surface display generator. 

From the architectural specification, we can compute a value for the circuit 
complexity if we make some fairly simple assumptions. The first assumption is 
that if we obtain a circuit complexity for the algorithm component processor, 
then all we have to do to get the total system complexity is multiply by the total 
number of processors required. The second assumption is that the complexity 
of the algorithm component processor is less than or equal to the complexity of 
a known microprocessor, say perhaps the MC68000 used in our evaluation of the 
algorithm component’s real-time capability. One paper, [Frank, 1981], provides 
a comparison of the Motorola MC68000 and the Zilog Z3000 with figures for the 
total number of transistors. For the MC68000, the total transistor count is 
approximately 68,000, with 50,000 of those transistors being in the microcode 
ROMs and PLAs and the remaining 18,000 being in the registers and random 
logic. For the Z8000, the total transistor count is specified as 17,500. Conse- 
quently, a good estimate for the circuit complexity of a processor such as the 
one we propose for the algorithm component processor is 18,000 devices, not 
counting the RAM space, or the ROM space. 

Figure 17 is a short table showing the breakdown of the algorithm com- 
ponent processor into pieces of similar circuit complexity. Using figures of two 
devices per bit for the random access memory (DRAM), and one device per bit 
for the read-only memory (ROM), we find that 195K devices are required for the 
storage alone. Adding that value onto the 18K devices that form the rest of the 
algorithm component processor, we note that the total number of devices the 
processor requires is on the order 215K. From the literature, we note that one 
million device VLSI chips are already being produced in the research lab 
[Gwynne, 1S83], with ten million device VLSI chips promised in the time period 
ranging from the year 1985 to the year 2001 [Uhr, 1984]. This means 4 algorithm 
component processors per chip at the one million devices per chip level, and 48 
algorithm component processors per chip at the ten million devices per chip 
level. For the 1000 processors needed for the contour surface display genera- 
tor, this means a total system size in the range of 250 to 21 VLSI chips. 

B. Large System Discussion 

For most of the uniprocessor, von Neumann world a system design consist- 
ing of 1,000 processors seems infeasible. In fact, even systems of 50 intercon- 
nected processors are not viewed as particularly viable. A large part of this 
skepticism derives from the difficulties involved in early multicomputer 
attempts such as the Illiac IV, and the Carnegie-Mellon C.MMP and CM* projects 
([Fuller, 1977], [Barnes, 1963], and [Wulf, 1972]). These initial multicomputer 
efforts "peaked" at the level of around 50 processors. The focus of these pro- 
jects has been to provide general purpose multicomputing. The economics of 
the design and construction effort dictated this slant. None of these multicom- 
puters w r as particularly successful in fulfilling the need for general purpose mul- 
ticomputing, and none of them was particularly useful for any specific applica- 
tion. Since the landmark 1977 article by Sutherland and Mead, 
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(1) RAM space — (2 devices/bit) 

2048 x 32 bits 

(2) ROM space -- (1 device/bit) 

— tree tables 

2048 x 16 bits = 32,763 devices 

— microcode 

1024 x 32 bits = 32,763 devices 
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— register block 
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— external registers 

— data and control buses 



= 18,000 devices 

Device total = 214,608 devices (215K devices) 

Figure 17 

Algorithm Component Processor's Circuit Complexity Estimate 
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[Sutherland, 1977], the economics and the focus of computer architecture 
research have changed. The VLSI revolution heralded by Sutherland and Mead 
has provided the capability for large scale, special architectures. One special 
architecture, the Massively Parallel Processor (MPP) delivered to NASA in 
December 1932, has 16,000 processors on 2,000 LSI chips [Potter, 1933]. Its pur- 
pose is to solve large two-dimensional image processing applications in real- 
time. It is "general purpose" in the sense that it is good for a wide range of two- 
dimensional image processing applications, but it is still a special architecture. 
The contour surface display generator is an even more specialized architecture 
than the MPP, although it is just as feasible. 

9. Conclusions 

This study has focused on the architectural specification and feasibility 
determination of the real-time contour surface display generator. The conclu- 
sions we draw are that yes, we can put together such a multiprocessor. Once we 
have made such an assessment, we then need to consider the next steps in this 
research effort. Two directions come to mind, the second following directly 
from the first. The first direction concerns the details of how the real-time con- 
tour surface display generator is interfaced to a display system. The impor- 
tance of this research direction becomes evident if we compute a value for the 
output data rate of the contour surface display generator. In Figure 15, the out- 
put is shown to be destined for a display device, with that output passing 
through a display controller. The assumption for that data transfer has been 
that it is accomplished via a DMA transfer mechanism of 32 bits width similar in 
operation to that of the DEC Unibus. Assuming that the output display is of aver- 
age size, 89,100 32-bit memory references, this is a data rate of 10.7 megabytes 
per second. The delivery of data to the display system at the rate of 10.7 mega- 
bytes per second is somewhat faster than current display system technology 
allows. Compounded with this problem, is the fact that besides being able to 
deliver the picture within the given time constraints, we also need to maintain 
the functionality of the display system. This means that if we add the contour 
surface display generator to a display system that we cannot reduce or elim- 
inate the display system's capability for real-time display rotation, scaling, 
translation, clipping, and other assorted, real-time operations. The full 
specification of the architectural changes required for the display system by the 
contour surface display generator are left as an area for further study. 

Once we have answered the questions with respect to the contour surface 
display generator’s impact on the design of the display system, the second 
research direction is to examine other graphics algorithms for implementation 
in VLSI. If we then perform the same study of the interfacing of those special 
purpose display generators with the display system, we can see if there are any 
general principles we can establish. It is not until this question is answered in 
the general case, that we can actually begin the systematic implementation in 
VLSI of speciaL purpose, real-time display generators. 
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